Scaling Semantic Frame Annotation

نویسندگان

  • Nancy Chang
  • Praveen Paritosh
  • David Huynh
  • Collin Baker
چکیده

Large-scale data resources needed for progress toward natural language understanding are not yet widely available and typically require considerable expense and expertise to create. This paper addresses the problem of developing scalable approaches to annotating semantic frames and explores the viability of crowdsourcing for the task of frame disambiguation. We present a novel supervised crowdsourcing paradigm that incorporates insights from human computation research designed to accommodate the relative complexity of the task, such as exemplars and real-time feedback. We show that non-experts can be trained to perform accurate frame disambiguation, and can even identify errors in gold data used as the training exemplars. Results demonstrate the efficacy of this paradigm for semantic annotation requiring an intermediate level of expertise. 1 The semantic bottleneck Behind every great success in speech and language lies a great corpus—or at least a very large one. Advances in speech recognition, machine translation and syntactic parsing can be traced to the availability of large-scale annotated resources (Wall Street Journal, Europarl and Penn Treebank, respectively) providing crucial supervised input to statistically learned models. Semantically annotated resources have been comparatively harder to come by: representing meaning poses myriad philosophical, theoretical and practical challenges, particularly for general purpose resources that can be applied to diverse domains. If these challenges can be addressed, however, semantic resources hold significant potential for fueling progress beyond shallow syntax and toward deeper language understanding. This paper explores the feasibility of developing scalable methodologies for semantic annotation, inspired by three strands of work. First, frame semantics, and its instantiation in the Berkeley FrameNet project (Fillmore and Baker, 2010), offers a principled approach to representing meaning. FrameNet is a lexicographic resource that captures syntactic and semantic generalizations that go beyond surface form and part of speech, famously including the relationships among words like buy, sell, purchase and price. These rich structural relations provide an attractive foundation for work in deeper natural language understanding and inference, as attested by the breadth of applications at the Workshop in Honor of Chuck Fillmore at ACL 2014 (Petruck and de Melo, 2014). But FrameNet was not designed to support scalable language technologies; indeed, it is perhaps a paradigm example of a hand-curated knowledge resource, one that has required significant expertise, training, time and expense to create and that remains under development. Second, the task of automatic semantic role labeling (ASRL) (Gildea and Jurafsky, 2002) serves as an applied counterpart to the ideas of frame semantics. Recent progress has demonstrated the viability of training automated models using frameannotated data (Das et al., 2013; Das et al., 2010; Johansson and Nugues, 2006). Results based on FrameNet data have been limited by its incomplete

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Event Frame Annotation into the Open Ontology Forge Annotation Tool

In this paper, we propose a scheme for event frame annotation integrated into the Open Ontology Forge (OOF) annotation tool. This is a key requirement for realization of knowledge description on the Semantic Web. Semantic information contained in each event frame is a set of relationships between a predicate and its arguments. As our aim is to keep OOF flexible for various types of annotation p...

متن کامل

Frame-Semantic Annotation on a Parallel Treebank

This paper reports on experiments in frame-semantic annotation of a parallel treebank. Selected English and Swedish sentences that contained verbs of motion and communication were annotated independently by two annotators. We found that they assigned the same frame to corresponding sentences in 52% of the cases. This leads us to the conclusion that parallel treebanks can save considerable effor...

متن کامل

Semantic Frame Annotation on the French MEDIA corpus

This paper introduces a knowledge representation formalism used for annotation of the French MEDIA dialogue corpus in terms of high level semantic structures. The semantic annotation, worked out according to the Berkeley FrameNet paradigm, is incremental and partially automated. We describe an automatic interpretation process for composing semantic structures from basic semantic constituents us...

متن کامل

A Radically Simple, Effective Annotation and Alignment Methodology for Semantic Frame Based SMT and MT Evaluation

We introduce a radically simple yet effective methodology for annotating and aligning semantic frames inexpensively using untrained lay annotators that is ideally suited for practical semantic SMT and evaluation applications. For example, recent work by Lo and Wu (2011) introduced MEANT and HMEANT, which are state-of-the-art metrics that evaluates translation meaning preservation via Propbank s...

متن کامل

Getting Deeper Semantics than Berkeley FrameNet with MSFA

This paper illustrates relevant details of an on-going semantic-role annotation work based on a framework called MULTILAYERED/DIMENSIONAL SEMANTIC FRAME ANALYSIS (MSFA for short) (Kuroda and Isahara, 2005b), which is inspired by, if not derived from, Frame Semantics/Berkeley FrameNet approach to semantic annotation (Lowe et al., 1997; Johnson and Fillmore, 2000).

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015